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Abstract 


The cost and safety goals for NASA’s next generation of reusable launch vehicle (RLV) will 
require that rapid high-fidelity aerothermodynamic design tools be used early in the design 
cycle. To meet these requirements, it is desirable to identify adequate statistical models that 
quantify and improve the accuracy, extend the applicability, and enable combined analyses 
using existing prediction tools. The initial research work focused on establishing suitable 
candidate models for these purposes. The second phase is focused on assessing the 
performance of these models to accurately predict the heat rate for a given candidate data set. 
This validation work compared models and methods that may be useful in predicting the heat 
rate. 

Introduction 

There are two phases to this project and they are (1) model development and (2) model 
validation. This approach was aimed at identifying statistical/mathematical models that best 
characterize and/or model a set of sphere stagnation data (See Table 1). Once several 
candidate models were identified, they are tested in the second phase to see if they are able to 
predict heat rate values for a given smaller candidate data set within the normal trajectory of 
corridor values. 

In the initial phase, five models for predicting heating rate within the entry corridor were 
identified. Comparisons of the adequacy of the models were made using R 2 (coefficient of 
determination), adjusted R 2 , standard error and F- statistic. In addition, graphical techniques 
like 3-D plots, contour plots, normal probability plots and residual analysis were used to 
better understand the relationship between the dependent and predictor variables. Some of 
the models including multiple linear regression, polynomial regression (quadratic and cubic), 
classification and regression trees (CART), Loess and Kriging models were investigated. 

The Loess and Kriging models were also explored during the second phase. Much of the 
initial phase work is summarized in a paper presented at the Annual Joint Statistical Meetings 
in August 2002 in New York City and is included in the Appendix. 

The second phase was directed at model validation. Performance comparisons were 
conducted on a candidate data set to see which of the identified models best predicted the 
dependent variable, heat rate, with overall minimum percent error. In this approach, the 
identified models from the initial phase are used to predict heating rate for a smaller sphere 
stagnation test data set. This was followed by error analysis to determine which model 
produced overall minimum error. 
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Validation Approach 


In this section, validation of identified models associated with the corridor data set is 
explored (See description in the Proceedingspaper in the Appendix). The validation process 
consisted of the following steps: 

(1) Use five identified models/methods on the candidate data set to obtain predicted heat 
rates for the respective models; 

(2) Calculate the percent (%) error [100* ((actual - predicted)/actual)] for each 
measurement in the data set for various models; 

(3) Identify the model or method that has consistent minimum percent error. For 
purposes of this study, a calculated percent error less than 5% is considered 
acceptable. 

Description of Other Models 
Loess 

Loess was originally pioneered by Cleveland, W.S. (1979). Specifically, Loess denotes a 
method that is more descriptively known as locally weighted polynomial regression. It is one 
of many modem modeling methods that build on classical methods, such as linear and non- 
linear least squares regression. Modem regression methods are designed to address situations 
in which the classical procedures do not perform well. Loess combines much of the 
simplicity of linear least squares regression with the flexibility of nonlinear regression. 

Assume that for i = 1 to n, the i* measurement y, of the response variable y and the 
corresponding measurement x , of the vector x of p predictors are related by 

yi = g( x >) + e, 

where g is the regression function and £-,is the random error. The idea of local regression is 

that near x = xo, the regression function g(x j) can be locally approximated by the value of the 
function in some specified parametric class. Such approximation is obtained by fitting a 
regression surface to the data points within a chosen neighborhood of the point xo. 

In the Loess method, weighted least squares is used to fit linear and quadratic functions of 
the predictors at the centers of neighborhoods. The radius of each neighborhood is chosen so 
that the neighborhood contains a certain percentage of the data points. The fraction of the 
percentage, called the smoothing parameter, in each local neighborhood controls the 
smoothness of the estimated surface. The larger values of the smoothing parameter produce 
the smoothest response functions. Data points in a given local neighborhood are weighted 
by a smooth decreasing function of their distance from the center of the neighborhood. 

In this analysis, the Loess method was applied to the sphere test data set of 15 measurements 
and used to predict the associated heat rate values. The results are shown in Table 2. 
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Kriging Method 

Kriging is an interpolation method named after a South African engineer named D. G. Krige. 
He developed the method in an attempt to more accurately predict ore reserves. Kriging is 
based on the assumption that the parameter being interpolated can be treated as a 
regionalized variable. A regionalized variable is intermediate between a truly random 
variable and a completely deterministic variable. It is assumed to vary in a continuous way 
from one location to the next and points that are near each other have a certain degree of 
spatial correlation. The points that are widely separated are assumed to be statistically 
independent (Cressie, 1993). 

There are several Kriging methods including Simple, Ordinary, Zonal and Universal Kriging. 
Only the ordinary Kriging method will be reviewed in this report to provide further 
understanding of how the method works. The first step in ordinary Kriging is to construct a 
variogram from a data set to be interpolated. A variogram consists of two parts: 
experimental variogram and a model variogram. Suppose the value to be interpolated is 
referred to as f. The experimental variogram is found by calculating the variance (v) of each 
point in the data set with respect to each of the other points and plotting the variance versus 
the distance (h) between the points. 

After the experimental variogram is computed, the next step is to define a model variogram 
This variogram is a mathematical function that models the trend in the experimental 
variogram When the model variogram is determined, it is used to compute the weights used 
in Kriging. The basic equation used in ordinary Kriging is as follows: 

F(x,y) = 

<=i 

Where n is the number of scatter points in the data set, are the values of the scatter points 
and Wj are the weights assigned to each scatter point. The Kriging method was implemented 
using the Sufer 8 Software package and applied to the sphere test data set to predict heat rate 
values. The results of the analysis are provided in Table 2. 


Surface-Fitting Method 

The Table Curve 3D, marketed by SPSS of Chicago, IL, is mathematical software designed 
for Windows which fits and ranks built-in frequently encountered models. In highly 
automated processing steps, Table Curve 3D allows the user to quickly review the ideal fit 
for three dimensional data using pre-defined equation sets. The software includes, but is not 
limited to the following fitting options: Surface-Fit All, Surface-Fit Linear Equations, 
Surface-Fit Simple Equations, Surface-Fit Robust Plane, Surface-Fit Polynomial Equations 
and Surface-Fit Rational Equations. 
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In implementing anyone of the above options, Table Curve 3D will rank the pre-defined 
equation set according to following statistical measures: R 2 (Coefficient of Determination), 
DF (Degrees of Freedom) Adjusted R 2 , Fit Standard Error and F-value. Table Curve 3D was 
applied to the sphere test data set to predict heat rates values. The results of the analysis are 
provided in Table 2. 

Prediction 

Table 1 in the Appendix lists the 15 data measurements used to assess the models relating the 
response variable, heat rate and 10 predictor variables. The actual, predicted heat rate values 
and the calculated percent error values are provided in Table 2 below. 


Table 1. Sphere Test Data Set 


Velocity, 

ft/sec 

Twall, 
deg R 

Pressure, 

Ib/ft 2 

Density. 

slugs/ft 3 

Temp, 
deg R 

Mach 

Dyn. Pres. 
Ib/ft 2 

Reynolds 
per ft 

Energy, 

ft 3 /sec 

Heat Rate, 
BTU/ft 2 /s 

2.500E+04 

3.000E+03 

5.84E-03 

1.01E-08 

3.36E+02 

2.78E+01 

3.16E+00 

9.65E+02 

1.58E+05 

3.07E+01 

2.350E+04 

3.000E+03 


1.33E-08 

3.36E+02 

2.61 E+01 

3.66E+00 

1.19E+03 

1.72E+05 

4.72E+01 

2.100E+04 

3.000E+03 





2.52E+01 

8.18E+03 


5.84E+01 

1.750E+04 

BE33E 





5.00E+01 



5.38E+01 

1.300E+04 






7.20E+01 



3. 11 E+01 

9.000E+03 

3.000E+03 

1.77E+00 

2.1 IE-06 


8.32E+00 

8.55E+01 

5.34E+04 

1.54E+06 


6.500E+03 

3.000E+03 

4.65E+00 

5.88E-06 

4.60E+02 

6.18E+00 

1.24E+02 

1.12E+05 

1.62E+06 

3.17E+00 

4.500E+03 



1.88E-05 


WR75?i!7 

1.90E+02 



3.88E+00 

2.540E+04 


■isiss 

5.73E-09 



1.85E+00 



BSSSfil 

2.250E+04 


■0I1IS 

1.01E-08 



2.56E+00 


1.15E+05 

4.10E+01 

2.050E+04 


BnfW=BT1 

1.58E-07 



3.33E+01 


1.36E+06 


1.450E+04 


3.35E-01 

4.50E-07 


1.42E+01 

4.74E+01 

2.02E+04 

1.37E+06 

3.32E+01 

1.200E+04 

3.000E+03 

1.10E+00 

1.34E-06 


1.12E+01 

9.64E+01 

4.59E+04 

2.31 E+06 

2.92E+01 

6.500E+03 

3.000E+03 


4.26E-06 


6.11E+00 

9.01 E+01 

7.99E+04 

1.17E+06 

2.72E+00 

3.250E+03 

3.000E+03 


2.63E-05 


3.27E+00 

1.39E+02 

2.75E+05 

9.02E+05 
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Table 2. Performance Comparison of Five Models 


5V Model CART 


22.88 


45.54 


63.89 


51.03 


26.9 




Loess 


47.57 


58.38 

53.47 


31.06 


13.36 


-43.70% 


562.46% 


-108.10% 


-45.10% 


59.16 


55.21 


30.91 



1.00% 

-4.90% 

-0.70% 

-0.30% 

0.00% 

-1.30% 

0.60% 

■■bebi 



■BEEB 

-1.10% 

-95.30% 

12.70% 

■EfES 

53.90% 

| 

mmm\ 

-0.40% 

i 

-0.20% 

-1.30% 

1.10% 

■B 

0.40% 



CART Table Curve 


4.90% 


-23.20% 


Kriging 


31.24 


47.88 


58.23 


55.13 


32.39 


12.81 


6.13 


3.15 


24.02 


42.75 


63.82 


33.89 


28.25 


5.18 


0.34 


- 1 . 66 % 


18.81% 


1.39% 


4.34% 


0.48% 


-1.95% 


3.22% 


90.44% 


42.37% 


Loess 
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Conclusions 


Five models and/or methods were considered in the validation phase including classical 
multiple regression, classification and regression trees (CART), Table Curve, Kriging 
and Loess. Based on the performance criteria and the results in Table 2, Kriging 
performed better then the other methods in predicting heat rates for the Sphere Data Set 
in Table 1 . Only three (3) of the Kriging predicited heat rate values exceeded the 
maximum 5% error (See Table 2). Loess and Table Curve both had five values that 
exceeded the 5% maximum error criterion as shown in Table 2. CART was the least 
effective at predicting heat rate values for the Sphere Test data set. 
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Multiple Regression Modeling of 
Aerothermal Data Sets 

Douglas DePriest and Carolyn Morgan 
Mathematics Department, Hampton University, 
Hampton, Virginia 23668 
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This paper describes several regression analysis 
techniques used to analyze and model spherical 
stagnation heating data sets observed in aerothemal 
experiments. The objective is to identify statistical 
models that are suitable for modeling and prediction 
when using spherical stagnation heating data. Some of 
the regression techniques include classical multiple 
regression, best subset selection, classification and 
regression trees (CART), contour plots, 3-D plots and 
other graphical methods including residual plots. 
Comparisons are made to facilitate model selection. 

Introduction 

Recent designed experiments at NASA has 
demonstrated the need for considering higher fidelity 
aerodynamic heating early in the design cycle. In the 
experiment, the vehicle shape was optimized for 
aerodynamic performance and resulted in severe 
aerodynamic heating that forced costly redesign of the 
nose and wing surfaces and lowered flight margins. 

The availability of higher fidelity aerothermal analysis 
earlier in the design cycle could have prevented some 
of these problems. 

There will be two phases to this project and they are 
model development and model validation. The initial 
phase will be to explore and/or develop the statistical 
and mathematical methods that can be used to 
transform the point wise aeroheating predictions of 
current tools to yield complete aerothermal 
environments through a trajectory corridor. The 
approach is intended to identify statistical and math 
models that best characterize and/or model a set of 
sphere stagnation data. Once several acceptable models 
and methods have been identified, they will be tested in 
the second phase to see if they are able to predict 
heating values within the normal trajectory of corridor 
values for minimum specified error. 

Description of Sphere Stagnation Data Sets 

There were two data sheets for analysis; one containing 
the full trajectory space (1269 measurements) and the 
second with measurements only within the entry 


corridor (138 measurements). For these data sets, a 
sphere shape configuration is used and the measurements 
are taken at the stagnation point on the sphere. For 
simplicity, a sphere of one foot radius is assumed. 
Generally, as the sphere radius decreases, the heat rate 
measurements will increase. 

For each case, eleven (11) variables are labeled on the 
worksheets. The first three variables (altitude, velocity, 
and wall temperature) are considered the basic 
independent variables. The other variables are derived 
from these basic three variables. For example, the 
density, pressure, and temperature are direct functions of 
altitude. The Mach number, dynamic pressure, Reynolds 
number, and energy variables are combinations of the 3 
basic variables, e.g. dynamic pressure equals 
density*velocity*velocity. The response variable of 
interest is heat rate. 

The heating rate values in the spreadsheet were generated 
using simple stagnation point calculations. The data sets 
contain values for velocity and altitude but no angle-of- 
attack values because of the sphere assumption. 

Analysis Methods 

Our goal is to begin with some rudimentary analysis on 
these type data sets to explore the behavior and 
relationships using correlations, summary statistics, 
graphics including contour plots and 3-D plots, robust and 
residual analyses. Several regression models were 
investigated including multiple linear regression, 
classification and regression trees (CART), best subset, 
quadratic and cubic regression models as well as some 
graphical methods. Regression analysis allows one to 
model the relationship between a response variable and 
one or more predictor variables. One of the useful 
features of a regression model is that it can be used to 
predict or estimate a future response value based on a 
given set of values of the predictor variables. 

Regression analysis results usually include the following: 
regression equation, predictor table, summary statistics, 
ANOVA Table, list of unusual observations, contour 
plots, 3-D plots and residual plots. To appropriately use 
the t-test, F-test and associated confidence intervals, the 
data are assumed to meet certain conditions. These 
include (1) the residuals (error component) are assumed to 
be normally distributed, (2) variation is constant 
(homoscedasity) and (3) the measurements are 
independent. The study of unusual patterns in the 
residuals through residual analysis may indicate 
underlying weaknesses in the model. 



Data Exploration 

It should be noted that unless otherwise indicated tables 
and figures will appear in the Appendix. First the 
summary statistics for the independent and response 
variables were computed. 

Prior to the model building activities, graphical 
methods were used to help identify any underlying 
relationships between the variables being studied. In 
particular, matrix plots of cross-graphs of the variables 
were generated. These plots showed: 

• Apparent quadratic relationship between heat 
rate and altitude, temperature 

• Strong linear relationship between mach and 
altitude, velocity and altitude 

• Possible exponential relationship between heat 
rate and Reynolds number. 

Next, correlation matrixes were developed which 
specified the Pearson correlation and corresponding p- 
values. Given the inherent relationships between the 
independent or predictor variables, a principal 
components analysis was performed to help define a set 
of orthogonal variables so that the first principal 
component accounts for the largest possible amount of 
the total dispersion in the data, the second principal 
component accounts for the second largest possible 
amount of the total dispersion in the data, etc. This 
would be beneficial in helping to identify a subset list 
of candidate predictor variables for the analysis 

Another method that was used to identify a subset list 
of candidate predictor variables is best subset selection. 
Best subset selection identified altitude, velocity, mach, 
dynamic pressure and Reynolds as the top five 
prediction variables. These are included in the 
regression model that is in the Appendix. 

Classification and Regression Tree (CART) based 
models are exploratory techniques for uncovering 
structure in data that are used: 

• To develop prediction rules that can be 
rapidly evaluated 

• To screen variables 

• To assess the adequacy of linear models 

• To summarize large data sets for both 
classification and regression problems. 

Figure 1 displays the resulting CART tree. CART 
selected velocity, mach, altitude, energy and dynamic 
pressure as the primary prediction variables. The tree 
indicates that for those data cases with velocity 
measurements less than 13750, the predicted values for 


heat rate are generally below thirty. Furthermore, if 
values of mach are below 7.7, then the predicted heat rate 
is approximately 3.909. 

Figure 1. Classification and Regression Tree 
for Corridor Data 


The classical regression methods are often used to obtain 
models for prediction. The challenge is the development 
of the best mathematical expression to describe in some 
sense the behavior of a random variable of interest as a 
function of one or more independent or predictor 
variables. The classical regression techniques however 
make several strong assumptions about the underlying 
data, and the data can fail to satisfy these assumptions in 
several different ways as indicated in the Analysis 
Methods paragraph. 

In the case where there are one or more outliers in the 
data or the data may not be fitted well by a linear model, 
robust regression methods come into play. This method 
minimizes the effect of the outliers and can be useful in 
helping to identify the outliers in the data. 

Scatterplot smoothers are useful tools for fitting arbitrary 
smooth functions to a scatter plot of data points. The 
smoother summarizes the trend of the response as 
function of the predictor variables. All of the above 
analysis methods are used to explore the given data sets. 


yaodty<1375Q 


Machk7fi8fi 1 


3.909 




Altituo&<287.5 


VledQtjy^1575Q n 


15.590 26.900 39.810 


Efwr8y<71146£. 
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DyrLEoks<74.7445 
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Results 


Several approaches were used in the exploration and 
identification of statistical and mathematical models for 
the given data sets including the classical multiple 
regression, classification and regression trees (CART), 
and several graphical methods. One of the interesting 
results concerns a comparison of some initial multiple 
regression model types using only the independent 
predictor variables for both full and corridor data sets. 
These results are summarized below in Table A that 
includes Data, Coefficient of Determination (R 2 ), 
Adjusted R 2 , fit standard error S, F-statistics and the 
model type. 

Table A: Comparison of Model Types for Full and 
Corridor Data using Three (3) Independent Predictor 
Variables 


Data 

si i 

Adj R 2 

Std Error 

F-statistic 

Model 

Foil 

48.3% 

48.2% 

196.5 

394.15 

Linear 

Corridor 

85.2% 

84.9% 

8.7 

257.21 

Linear 

Foil 

84.4% 

84.4% 

108.0 

1140.45 

Quadratic 

Corridor 

97.6% 

97.5 

3.54 

895.68 

Quadratic 

Foil 

85.6% 

85.5 

103.9 

937 J21 

Cubicjnter 

Corridor 

99.7% 

99.6% 

1.34 

4751.89 

Cubicjnter 


In reviewing Table A, a likely conclusion is that the 
regression models appear to be more appropriate for the 
corridor data than the full trajectory data sets for all 
model types. 

A more in-depth analysis was conducted on the corridor 
data set as it simulates possible entry trajectory and 
heating rates of an experimental space vehicle. The 
structured approach that was used for model 
identification consists of the following: (1) analyze the 
summary statistics results for errors and consistency, 

(2) conduct analysis using only the independent 
predictor variables, (3) use graphical methods to 
identify underlying relationships, (4) employ methods 
(Best Subset Selection, Principal Components, etc..) 
which aid in identifying most likely additional predictor 
variables and (5) specify a model using classical 
regression, classification and regression trees (CART) 
and other statistical methods. The results are 
summarized in Table B below that includes rank, R 2 , 
adjusted R 2 , fit standard error (S), F-statistics and the 
model type. For illustrative purposes. Tables land 2 
and Figures 2 and 3 in the Appendix provide some of 
the graphical and regression modeling methods that 
were considered for model selection. 


Table B: Identification of Model Types for 
the Corridor Data Set 


Rank R 2 

AdiR 2 

Std Error 

F>statistic 

Model 

, 

99.8 

99.7 

1.130 

4140.87 

Quad&Inter (5V) 

2 

99.7 

99.6 

1.348 

4196.66 

Cubic&Inter (3 V) 

3 

99.4 

99.4 

1.724 

3843.80 

Linear (6 V) 

4 

99.4 

99.3 

1.851 

1665.11 

Cubic WO Ind(7V) 

5 

99.1 

99.0 

2.209 

2798.68 

Linear 1 (5V) 

6 

97.6 

97.5 

3.54 

895.68 

Quad& Inter (3 V) 

7 

96.77 

— 

4.1533 

483.15 

CART 

8 

85.2 

84.9 

8.72 

257.21 

Linear (3 V) 


Conclusions 

A number of methods were considered in this analysis 
including classical multiple regression, polynomial 
regression, classification regression trees (CART), 
principal components, correlation matrix and residual 
analysis. Several graphical methods were used in model 
development and assessing model adequacy. In addition, 
several techniques were used to screen/identify 
underlying relationships. For example the matrix plots 
suggested the inclusion of quadratic and interaction terms 
in our models. On the other hand, principal components 
and best subset selection methods were used to screen and 
identify the main predictor variables. All of these 
methods were useful in guiding us in the selection of the 
predictor variables in our models. Using all of the above 
methods, several promising candidate models have been 
identified that may be used to predict the response 
variable, heat rate for the entry corridor data set. In the 
next phase of our work, validation of the adequacy of our 
models and other advanced methods will be explored. 
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APPENDIX 


Table 1. Regression Model Corridor Data (5V) 


The regression equation is 

Heat Rate, BTU/ft2/s = 66.3 - 0.871 Altitude, kft 

+ 0.0246 Velocity, ft/sec - 14.4 Mach - 0.162 Dyn. Pres. Ib/ft2 
+ 0.000054 Reynolds per ft 


Predictor 

Coef 

SE Coef 

T 

P 

Constant 

66.274 

5.196 

12.76 

0.000 

Altitude 

-0.87060 

0.03237 

-26.89 

0.000 

Velocity 

0.0246292 

0.0005425 

45.40 

0.000 

Mach 

-14.4126 

0.5654 

-25.49 

0.000 

Dyn. Pre 

-0.16206 

0.01120 

-14.47 

0.000 

Reynolds 

0.00005438 

0.00000767 

7.09 

0.000 

S = 2.209 

R-Sq = 

99.1% R-Sq(adj) = 

99.0% 


Table 2. Analysis of Variance (5 V) 


Source 

DF 

SS 

MS 

F 

P 

Regression 


68258 

13652 

2798.68 

0.000 

Residual Error 


644 

5 



Total 


68902 







Figure 3. Normal Probability Plot (5V) 


Normal Probability Plot of the Residuals 

(response is Heat Rate) 
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Figure 4. Residual Plot (5V) 
Residuals Versus the Fitted Values 

(response is Heat Rate) 
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